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ABSTRACT  * 


In  recent  years  more  and  more  attention  was  paid  to  digital 
image  processing  especially  as  a result  of  the  developement  of 
highly  efficient  algorithms  and  also  because  of  technologically 
better  facilities.  Concurrently  attempts  were  made  to  find  a 
mathematical  model  for  human  vision  to  achieve  better  underetanding 


about  that  mechanism.  Some  of  the  image  processing  problems  that 


were  (and  are)  tackled  are  image  enhancement,  bandwidth  reduction, 
image  transmission  etc.  Unfortunately  very  few  have  taken  the 
mechanism  of  the  human  vision  into  consideration  in  their  processes. 


This  work  is  an  attempt  to  incorporate  the  model  of  human 
vision  in  image  transmission  and  coding.  An  optimal  system  is 
developed  to  transmit  a digital  image  over  a noisy  channel.  The 
same  system  is  used  for  image  bandwidth  reduction  utilizing  a simple 
coding  scheme  which  is  not  based  on  the  knowledge  of  the  statistics 
of  the  Image  In  question.  Us  demonstrate  the  Improvement  of  the 
optimal  system  over  other  similar  systems  and  provide  explanation 
for  situations  where  other  systems  failed.  The  model  we  use  for 
transmitting  images  can  be  also  interpreted  as  the  model  of  the 
visual  mechanism  itself  and  thus  shed  some  light  on  human  vision 
from  a new  interesting  aspect. 

•This  report  reproduces  a dissertation  of  the  same  title 
submitted  to  the  Department  of  Electrical  Engineering, 
University  of  Utah,  in  partial  fulfillment  of  the  requirements 
for  the  degree  of  Doctor  of  Philosophy. 
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INTRODUCTION 


In  the  last  decade  digital  image  proceeeing  and  digital  image 
coding  have  developed  coneiderably.  Thi9  ie  a trend  caused  not  only 
by  the  availability  of  better  and  more  efficient  facilities  but 
mainly  as  a result  of  the  developement  of  very  efficient  algorithms 
such  as  the  fast  Fourier  transform  and  the  high  speed  convolution 
algorithm.  Successful  attempts  to  deblur  and  enhance  images 
digitally  have  drawn  more  and  more  attention  to  the  subject. 
Presently  the  usual  goal  of  image  processing  is  to  produce  an  image 
to  be  looked  at  by  a subjective  observer.  Although  research  has, 
and  Is,  being  done  to  find  a good  model  for  human  vision  very  few 
have  attempted  to  Incorporate  those  models  In  the  proceeeing  of 

images.  It  is  beyond  doubt  that  human  vision  is  not  equally 

sensitive  to  all  spatial  frequencies,  and  it  seems  that  in 
processing  more  emphasis  has  to  be  put  on  the  "more  important" 

frequencies  especially  in  coding  and  transmitting  images  digitally. 

The  work  describee  here  suggests  a system  for  transmitting 

images  over  a noisy  channel,  that  incorporates  some  properties  of 
human  vision.  Ue  optimize  the  system  according  to  some  optimization 
criterion  and  then  check  its  performance  and  compare  it  with  other 


I 


aval  labia  schemes. 

The  same  system  is  used  also  for  bandwidth  reduction.  Our 


I 
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objective  in  this  respect  is  not  to  incorporate  coding  schemes  that 
rely  on  statistical  properties  of  the  source,  since  those  are 
usually  unavailable,  but  rather  use  coding  schemes  that  are 
independent  of  the  ensemble  of  images  to  be  coded.  Although  the 
reduction  of  bandwidth  that  may  be  achieved  in  this  manner  is 
smaller  than  the  reduction  achieved  by  using  statistical  information 

l 

it  was  felt  that  a practical,  easy  to  implement  method  is  preferred. 

The  mathematical  model  for  human  vision  which  is  incorporated 
in  this  work  is  assumed  to  be  given.  Of  the  models  that  appear  in 
literature  two  were  selected  and  used  throughout  this  work.  It  is 
shown  that  the  influence  of  the  linear  part  of  the  model  on  the 
processing  Is  only  minor  and  therefors  the  choice  among  all 
available  models  (which  are  of  course  very  close  to  each  other)  is 
not  critical  from  our  point  of  view.  The  fact  that  the  linear  part 
of  the  visual  model  has  very  little  effect  on  the  results  will  also 
be  discussed. 

In  chapter  one  we  introduce  our  model  for  image  transmission 
and  the  motivation  behind  that  model.  Ue  optimize  the  system  and 
compare  it  with  previously  suggested  models.  Chapter  two  analizes 
the  system  and  points  at  its  important  properties.  It  is  shown  that 
the  model  ue  suggest  may  also  bs  intsrpreted,  under  some 
assumptions,  as  a model  of  the  human  vision  mechanism  itself  and 
thus  provide  some  interesting  insight  into  human  vision.  This  is 
discussed  in  chapter  three  utilizing  the  results  obtained  in 
previous  chapters.  The  problems  of  implementing  the  system  on  a 
digital  computer  are  discussed  in  chapter  four,  and  finally,  in 
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chapter  five  we  introduce  a coding  scheme,  independent  of  the  image 
statistics,  that  fits  into  our  scheme  and  allows  the  use  of  this 
optimal  system  as  a means  of  bandwidth  reduction.  Comparision 

between  results  obtained  by  this  method  with  results  obtained  by 
other  available  schemes  is  also  included  in  this  chapter  which 
visually  demonstrates  the  advantages  of  our  optimal  system. 
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CHAPTER  ONE 

INTRODUCTION  OF  THE  SYSTEM 

In  this  chapter  we  shall  suggest  a system  for  transmitting 
images  over  a noisy  channel  that  incorporates  a model  of  human 
vision  In  the  design.  Ue  shall  introduce  the  optimization  criterion 
and  compare  the  resultant  system  with  similar  ones  suggested  before 
by  others. 

The  system  in  principle  is  described  schematically  in  the 
block  diagram  of  fig  (1.1).  It  consists  of  a preprocessor  that 
processes  the  image,  and  a postprocessor  that  undoes  this  processing 
at  the  other  end  of  the  channel.  The  pre  and  post  processors  will 
depend  on  some  properties  of  human  vision  that  are  represented  by  a 
mathematical  model.  The  characteristics  of  the  channel  will  also 
influence  the  pre  and  post  processors  thus  making  the  entire  system 
most  immune  to  the  disturbances  along  the  transmission  path. 

Recent  research  shows  [6,10]  that  human  vision  can  be 


approximately  modelled  as  consisting  of  two  parts:  A nonlinear 

memorylese  system  in  cascade  with  a linear  system,  with  memory,  ae 
deecrlbed  in  the  block  diagram  of  fig.  (1.2).  The  function  F(-)  has 
been  shown  by  experiments  to  be  a monotonic  increasing  convex 


this  means  that  no  information  is  lost  by  passing  the 
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signal  through  F(-).  The  logarithmic  function,  satisfying  this 

constraint,  is  the  mo9t  commonly  used.  This  is  justified  by  the 
physical  properties  of  light  as  detailed  later  in  this  uork. 

In  order  to  optimize  the  pre  and  post  processore  we  have  to 

set  a norm  for  the  distortion  and  try  to  minimize  this  quantity. 

The  norm  chosen  in  this  work  is  the  mean  square  error.  Although  the 

criterion  of  mean  square  error  is  not  always  the  best  for  image 
processing  14]  it  is  mathematically  (to  date)  the  most  tractable, 
and  was  chosen  because  of  that. 

Fig  (1.3)  shows  the  full  system  In  detail.  The  quantity  I" 
can  be  described  as  the  image  transmitted  from  the  retina  to  the 
observer’ s brain.  I 2"  is  the  perception  of  the  image  if  looked  at 
directly  by  the  observer  whereas  It"  is  the  perception  of  the  image 
looked  at  after  being  transmitted  through  the  system.  The  proces 
Itself  terminates  with  the  production  of  Ij’  at  which  a human 
observer  will  look.  The  norm  we  define  is 
d (x,  y)  - (I  Z”-I  x" ) z 

and  since  the  image  is,  statistically,  a member  of  an  ensemble  we 
sha II  dsf I ne 

M (x,  y)  - E (d  (x,  y) } 

Where  E t •)  is  the  expected  value  operator.  He  shall  later  optimize 
the  system  by  minimizing  II. 

The  system  presented  thus  far  has  no  constraints  imposed  on 
it.  In  this  situation  there  is  no  optimal  solution  since  the  best 
system  is  obtained  by  making  the  preprocessor  be  a pure  amplifier 
that  will  amplify  I(x,y)  to  such  an  extent  that  the  channel  noise  is 
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negligible.  This  of  course  is  not  possible  in  practice.  The  actual 
power  produced  bg  the  preprocessor  1 6 M m 1 1 ted.  There  must 
therefore  be  some  constraint  imposed  on  the  system  to  make  it 
feasible.  Ue  chose  to  limit  the  average  energy  (per  image)  produced 
by  the  preprocessor.  This  measure  is  equivalent  to  limit  ting  the 
s i gna I -to-no i se  ratio  of  the  channel.  For  a given  channel,  for 
which  the  signal  to  noise  ratio  is  given,  and  for  a (statistically) 
given  noise  the  average  energy  in  the  signal  may  be  computed. 
Imposing  this  constraint  on  the  system  will  force  the  preprocessor 
to  suppress  some  frequencies  relative  to  others  (some  of  them  may  be 
totally  ignored)  so  that  the  best  fidelity  is  achieved.  An  optimal 
solution  to  the  problem  with  this  last  constraint  imposed  exists  and 
is  derived  in  appendix  A. 

Taking  the  model  of  the  visual  system  as  in  fig  (1.2)  we 
propose  that  the  preprocessor  have  the  same  structure,  namely  a 
memory  less  nonlinear  system  F(0  in  cascade  with  a linear  system  A, 
and  the  post  processor  be  a linear  system  B in  cascade  with  the 
inverse  memoryless  system  of  the  preprocessor  F ■*(•).  Fig  (1.3)  can 
now  be  expanded  to  look  as  in  fig  (1.4). 

The  reader  may  question  the  necessity  of  making  the  nonlinear 
portion  of  the  preprocessor  equal  to  that  of  the  visual  model. 
Indeed,  in  the  most  general  case  those  functions  should  not  be 
identical.  This,  however  will  introduce  the  problem  of  obtaining 
the  second  order  statistics  of  the  signals  involved  (which  is 
essential  to  the  solution  of  the  problem)  which  is  impossible  to 
evaluate  in  the  general  case. 
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The  reader  will  readily  convince  himself  that  the  relation 
between  I"(x,y)  and  I(x,y)  as  described  previously  can  be 
schematically  expressed  in  the  block  diagram  of  fig  (1.5)  to  which 
all  later  computation  will  refer. 

Since  the  function  F(0  is  one-to-one  the  knowledge  of  I(x,y) 
Implies  the  knowledge  of  D(x,y)  and  vice  versa.  Ue  can  thus  assume 
that  the  whole  process  starts  with  the  given  signal  0(x,y).  The 
relation  between  I"  (x,y)  and  0(x,y)  Is  linear  and  analysis  becomes 
much  easier. 

Note  also  that  O’ (x,y)  is  not  the  signal  to  be  viewed.  The  signal 
to  be  looked  at  by  the  observer  is  I * (x,  y)  -F'1  tD’  (x,  y)  ] uhich  does 
not  appear  in  fig  (1.5). 

It  seems  propsr,  at  this  point,  to  msntion  two  papers  closely 
related  to  this  subject.  One  is  by  Stockham  Cll] , the  other  is  by 
Mannos  and  Sakrison  (S) . Stockham  in  his  early  work  (12),  suggested 
a system  like  that  of  fig  (1.4)  but  set  the  linear  systems  A and  B 
equal  to  V and  V*1  respectively.  No  optimization  attempt  was  done 
but  rather  the  argument  was  as  follows. 

If  the  original  image  I(x,y)  is  viewed  by  the  visual  system  we 
get,  at  the  output  of  that  system,  the  signal 

Ii"  - F(I)©V  t 

whereas  when  the  image  l(x,y)  is  viewed  by  the  same  visual  system 
after  passing  through  the  channel  (with  channel  noise  N(x,y)  ) we 


In  this  document  the  © sign  denotes  the  convolution  operator. 


get 


1 2"  (x, y)  - 1 1"  (x, y)  + N(x,y) 


eo  that  if  the  noiee  ie  white  there  will  be  a white  dieturbance  to 
the  signal  aent  from  the  retina.  The  parameters  of  the  visual  model 
were  based  upon  psychophysical  data  available  at  that  time. 
Experiments  carried  out  did  not  produce  nice  looking  pictures  when 
using  a visual  model  that  complied  with  the  psychophysical  data 
available,  and  the  actual  model  used  was  a corrected  version.  No 
explanation  was  given,  in  terms  of  the  system,  for  those  results. 
More  elaborate  discussion  for  the  reasons  will  be  given  later  in 
this  chapter  and  in  chapter  3. 

Mannoe  and  Sakrison  carried  Stockham’s  ideas  one  step  further 
by  trying  to  optimize  the  visual  model  In  a subjective  way  and,  at 
the  same  time,  allow  for  some  bandwidth  reduction.  Mannoe  and 
Sakrison  defined  a distortion  criterion  and  a rate  distortion 
function,  and  produced  a collection  of  images  processed  by  their 
system  with  controlled  distortion. 

The  coding  scheme,  based  on  the  defined  rate  distortion 
function,  requires  the  knowledge  of  the  source  statistics  which  Is 
usually  unavailable.  Mannos  and  Sakrison  noticed  that  a Gaussian 
source  ie  the  worst  to  code  i.e.  any  source  will  yield  a smaller 
rats  distortion  than  a Gaussian  would,  thus  by  assuming  Gaussian 
source  they  set  an  upper  bound  for  the  rate  distortion.  Note  that 
in  practice  for  implementation  of  this  method  one  would  need  the 
statistics  of  the  source. 

By  controlling  the  distortion,  as  described  above,  Mannos  and 
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Sakrlson  produced  a collection  of  images  that  were  later  viewed  by  a 
group  of  observers,  who  refereed  the  images  subjectively.  One  of 
their  final  results  was  finding  a visual  model  that  cone i stent  I y 
produced  better  looking  pictures  in  their  method.  It  should  be 
emphasized  that  neither  of  those  works  optimize  a system  for  a given 
visual  model  (as  done  in  this  work)  but  rather  used  the  visual  model 
itself  as  a preprocessor. 

Using  the  calculus  of  variation  and  the  mean  square  error 
criterion  the  optimum  systems  A and  8 are  derived.  The  detailed 
derivation  Is  given  in  appendix  A and,  in  this  section  we  ehall  only 
summarize  those  results.  The  linear  systems  A and  B are  specified 
by  their  two  dimensional  frequency  response  and  obey  the  following 
equations! 

(&vk)v,[AV(f) -(&*,,) if  „V> (&**)»/* 

(1.1)  A* ( f ) - 

0 ootherwlse 

(1.2)  B(f)-AMf)/ CIA(f) 

In  the  above  equations! 

Se(f ) is  the  power  spectrum  of  D (x, y) «F (I (x, y) ) 

S«(f)  i 8 the  power  spectrum  of  the  channel  noise 

V(f)  is  the  linear  portion  of  the  visual  model,  and  is  assumed  to 

be  known  (6,10  and  others). 

H is  a scalar  evaluated  to  yield  the  prescribed  signal  to 

noise  ratio. 


denotes  complex  conjugate. 
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In  all  the  above  and  hereafter  (f)  stands  for  the  two  dimensional 
frequency  (fu.M*  The  signal  to  noise  ratio  is  denoted  by  P. 

Special  cases  such  as  frequencies  for  which  Sd(f)  - 0 etc.  are 
discussed  in  appendix  A. 

Although  the  noise  is  arbitrary  and  no  constraints  are  imposed 
on  Its  statistics  we  shall  assume  in  the  following  section,  for  the 
sake  of  the  qualitative  discussion,  that  it  is  white. 

In  fig(l.S)  we  have  an  original  (called  "MILL")  and  the  power 
spectrum  estimate  it  generates  (on  a db  scale).  Fig  (1.7)  displays 
a second  original  (called  "BARN")  and  its  estimated  power  spectrum 
(on  a db  scale).  Some  remarks  on  the  problems  involved  and  the 
algorithm  used  for  power  spectrum  estimation  are  given  in  chapter  4. 
Ue  tested  our  method  on  three  different  visual  models.  The 
frequency  response  of  the  linear  portion  of  these  models  is  given  in 
fig  (1.8).  Using  these  three  models  and  the  power  spectrum 

estimates  of  fig  (l.Sb)  we  calculated  the  frequency  response  of  the 

A and  B filters  which  are  given  in  fig  (1.9)  and  fig  (1.10) 

respect i ve I y. 

The  shape  of  Sd(f)  resembles  very  much  the  frequency  response 
of  a low  pass  filters  i.e.  most  of  its  energy  is  in  the  low 
frequencies  (both  f*  and  fy) . In  images  that  contain  more  man-made 
objects  (like  the  MILL  in  fig  (1.6))  we  find  that  S*(f)  has 

additional  energy  along  the  axes.  Since  by  assumpton  Sn(f)  is  white 
has  essentially  the  shape  of  a highpaes  filter  which  Is  also  tha 
general  shape  of  V(f),  up  to  the  point  where  V(f)  starts  to  taper 
off. 


Fig.  1.8  Linear  portions  of  various  Visual  Models 

(a)  Slow  saturating 

(b)  Fast  saturating 

(c)  Taper-off 


Fig.  1.9  A-filters  for  various  Visual  models 

(a)  Slow  saturating 

(b)  Fast  saturating 

(c)  Taper-off 


B-filters  for  va 

(a)  slow  satura 

(b)  Fast  satura 

(c)  Taper-off 
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Now  If  ue  take  the  case  where  fi  gets  larger  (i.e.  increase 
In  the  signal  to  noise  ratio)  we  may  approximate 

A*(f)  - //V(f)  (S^d)>/i  , B (f ) - A"1  ( f ) 
and  since  (S<y£<()1/I  as  ment  ioned  before  has  a shape  close  to  V ( f ) we 
may  say  that  A ( f ) is  proportional  to  V ( f ) and  B(f)  is  its  inverse. 
This  justifies  the  model  suggested  by  Stockham  for  high  signal  to 
noise  ratios.  For  low  signal  to  noise  ratios  this  will,  of  course, 
yield  worse  looking  images. 

Another  special  case  to  discuss  is  the  following;  Ue  showed 
before  that  V ( f ) and  (S|yfc-)‘/*  are  essentially  highpass  so  let  ue 
consider  the  special  case  in  which 

(1.3)  V(f)-(s^d)^ 

In  this  case,  substituting  equation  (1.3)  into  equation  (1.1)  and 
equation  (1.2)  ue  get 

(1.4)  A*( f ) - (m~ 1) V«(f ) 

or 

(1.5)  A ( f ) - (/*-l)‘"V(f) 

and 

(1.6)  B (f ) - Um- D^/yt/JV-^f) 

Thue  ue  see  that  A ( f ) and  B ( f ) are  proportional  to  V ( f ) and  V**(f) 
respectively.  This  is  the  only  case  in  which  the  preprocessor  is 
equal  (except  for  some  constant  amplification)  to  the  visual  model. 
This  case  has  some  qualitative  significance.  Discussion  on  the 
Implication  of  these  results  is  given  in  chapter  3. 

Equation  (1.1)  specifies  the  system  A in  terms  of  its 
frequency  response  (for  a given  signal  to  noise  ratio  affecting  //) . 
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Obviouely  for  large  n,  the  right  hand  aide  of  equation  (1.1)  will 
be  posit  I ve  for  all  frequencies.  For  small  values  of  m there  will 
be  some  frequencies  that  would  be  suppressed.  Since  A ( f ) is  a 
continuous  function  of  n there  exists  a smallest  (critical)  m> 
denoted  by  Me  for  which  all  frequencies  are  transmitted.  Thie 
notion  of  the  critical  u will  be  used  later  for  compar i s i ons. 

The  last  item  to  be  discussed  here  is  the  function  F(-).  flany 
functions  have  been  suggested.  The  most  common  one  is  the 
logarithm.  There  is  some  physical  justification  for  the  use  of  the 
logarithm  or  a function  very  close  to  it  (as  explained  later  in 
chapter  3).  tlannoa  and  Sakriaon  using  the  same  structure  for  the 
visual  model  (i.e.  a memory  I ess  nonlinear  system  followed  by  a 
linear  system  ) noticed,  in  their  experiment,  that  there  was  almost 
no  difference  uhich  function  was  used  in  the  nonlinear  part  with  a 
slight  preference  of  the  cubic  root.  Since  the  results  of  their 
experiment  are  subjective  it  is  hard  to  comment  or  give  physical 
interpretation  to  those  results  especially  that  the  functions  used 
were  very  close  to  each  other.  Experiments  done  by  this  author  with 
square  root,  logarithm,  and  some  other  functions  yielded  very  slight 
differences  If  any.  In  the  rest  of  ths  experiments  described  here 
the  logarithm  function  uae  used.  It  Is  important  to  remember  that 
the  choice  of  the  function  F is  completely  independent  and  does  not 
effect  the  design  of  the  rest  of  the  system.  The  answer  to  the 
question  of  what  function  F ( ■)  optimizes  the  system  is  therefore 
subjective.  No  attempt  was  made  in  this  work  to  find  the  best  F ( - ) 
and  the  choice  of  the  logarithmic  function,  that  ie  used  throughout 
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this  work,  Is  justified  by  the  reasons  given  in  chapter  3. 
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CHAPTER  TUO 

ANALYSIS  OF  THE  SYSTEM 

In  the  previous  chapter  we  introduced  the  main  ideas  of  the 
suggested  system.  In  this  chapter  we  shall  analyze  the  components 
of  the  system. 

Looking  at  equation  (1.2)  we  notice  that  B(f)  is  the  Uiener 
filter  for  a given  A ( f ) . This  implies  some  known  facte  about  B(f) 
as  follows.  For  low  noise  levels  B(f)  converges  to  the  inverse  of 
A.  Frequencies  that  are  zeroed  out  by  system  A wl I I aleo  be  zeroed 
by  system  B etc.  The  dependence  of  the  system  B upon  the  eye  model 
is  implicit  in  that  A is  dependent  upon  the  eye  model  and  B depends 
on  A.  Ue  shall  devote,  therefore,  the  rest  of  the  chapter  mainly  to 
analyzing  the  system  A. 

Equation  (1.1)  shows  the  dependence  of  the  system  A on  the 
visual  model.  Ue  mentioned  before  that  the  system  A,  in  most 
instances,  looks  very  much  like  the  visual  model  itself. 
Qualitatively  this  shoulc  be  expected  since  frequencies  important  to 
human  vision  are  expected  to  be  emphasized  over  the  the  leee 
important  ones  thus  shaping  the  filter  A in  the  direction  of  the 
visual  model  chosen.  Also  one  must  bear  in  mind  that  the 
optimization  criterion,  the  least  squares  estimation,  influences  the 


shape  of  A no  matter  what  visual  model  :s  used. 


The  question  that  arises  immediatelly  is  the  one  of  choosing 
an  eye-model.  There  are  a few  models  to  consider,  most  of  them 
suggested  previously  by  various  researchers.  Stockham  proposed  two 
different  models  [10, 11,12]  flannos  suggested  another  [61  and  many 
others  appear  in  literature.  Althogh  the  visual  system  has  been 
shown  not  to  be  isotropic  all  the  models  suggested  are  circularly 
symmetric  for  reason  of  ease  of  implementation.  The  models  used  in 
this  research,  although  they  do  not  have  to  be  symmetric,  were 
chosen  to  be  such.  In  any  case  all  the  models  suggested  are  some 
sort  of  high  or  bandpass  filters.  This  stems  from  psychophysical 
experiments  and  are  justified  physically  by  the  desire  to  seperate 
illumination  from  reflectance  (which  implies  emphasis  of  high 
frequencies  over  low  frequencies)  and  by  the  imperfectness  of  the 
lens  system  (which  causes  tapering  off  at  very  high  frequencies). 
Ue  shall  elaborate  more  on  this  subject  in  chapter  3. 

For  a given  signal  to  noise  ratio  and  a given  visual  model, 

Th?  syi^RT^”naiWr^,- tTie  are  "uv'i  j m ned  funder 

the  chosen  optimization  criterion)  and  are  given  in  equations  (1.1) 

and  (1.2). 

It  is  possible,  for  a given  signal  to  noise  ratio,  to  vary  the 
visual  model  and  produce  a collection  of  images  to  be  viewed  by 
subjective  observers  and  determine  the  "best"  visual  model  for  this 
system.  Preliminary  experiments  showed  that  the  difference  in 
choosing  between  the  results  for  the  various  models  tested  are  very 
slight,  and  a more  careful  experiment  has  to  be  conducted  In  order 
to  come  up  with  a conclusive  result  relating  to  the  visual  model. 
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In  fig  (2.1)  three  different  proceeeed  versions  (of  the 
original  of  fig  (1.6a))  are  shown.  Each  of  the  three  proceeeed 
versions  was  produced  using  a different  visual  model  with  the  same 
signal  to  noise  ratio  of  Pe-14.5.  Isometric  plots  of  the  visual 
models  used  are  shown  in  fig  (1.8).  Most  conspicuous  in  comparing 
the  processed  versions  of  fig  (2.1'  is  the  fact  that  the  pictures 
appear  only  slightly  different  from  one  other.  In  fig  (2.2)  we  have 
the  second  original  (of  fig  (1.7a))  processed  by  the  system  using 
the  visual  model  of  fig  (1.8c)  and  P-Pe-14.5. 

The  fact  that  the  system  is  not  very  sensitive  to  changes  in 
the  visual  model  is  somewhat  encouraging.  Definitely  the  true 
visual  model  is  slightly  different  for  different  observers.  It  is 
therefore  desired  that  the  system  be  insensitive  to  the  visual  model 
so  that  its  performance  will  be  close  to  optimal  for  as  wide  a class 
of  observers  as  possible.  On  the  other  hand  the  fact  that  the 
entire  system  is  eo  slightly  dependent  on  the  visual  model  (in  the 
final  sub  ject  ive  Appearance  of  the  images  produced)  makes  tne 
finding  of  a "best  model"  a much  harder  problem  to  eolve. 

The  insensitivity  to  the  visual  model  raises  the  question  of 
the  importance  of  incorporating  a visual  model  in  the  tranem 'lesion 
scheme.  In  our  case  there  is  a slight  improvement  in  performance 
when  using  a visual  model  over  the  case  where  a flat  visual  model 
was  used.  However,  this  author  believes  that  the  visual  model  mu3t 
be  included  in  any  kind  of  image  processing  and  that  its  influence 
may  be  enhanced  when  a different  fidelity  criterion  is  used. 

Some  remarks  are  appropriate  here  to  point  out  the  nature  of 
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Fig.  2.2  Processed  versions  of  BARN  using  the  Taper-off  Visual  model 

and  P=P  =14.5 
c 


the  dependence  of  the  system  A and  6 on  their  parameters. 

From  Appendix  A we  find 

(2.1)  a - (P+l)-/a,df//|V(f)  l(3^)‘/*df 

Let  ua  assume  that  two  different  experiments  are  conducted 
with  the  same  signal  to  noise  ratio  P (that  yields  a > Ac).  but 
changing  only  the  visual  model  in  such  a way  that 

(2.2)  V2( f ) - KVj(f) 

Where  Vt(f)  and  Vz(f)  are  the  visual  models  for  the  first  and  second 
experiments  respectively  and  K is  a positive  constant.  Using 
equation  (2.1)  we  find 


(2.3) 


Az  - Ai/K 


which  in  turn  yields 

(2.4)  AzVz(f)  - Cai/K]  KVi(f)]  - AiVi(f) 

Looking  at  equation  (1.1)  we  notice  that  A(f)  depends  only  on  the 
product  «V(f)  and  considering  the  result  of  equation  (2.4)  we 
conclude  that  for  a given  signal  to  noise  ratio  the  system  is 

1 ^ - — ■ — ^ i ii  ■ — - ■ a . , „ 

invariant  under  scaling  of  the  the  visual  model.  Obviously  since 
B(f)  depends  only  on  A(f)  there  will  be  no  change  in  B(f)  in  this 
case. 

Another  property  is  the  following.  Assume  that  in  two 
different  experiments  we  use  the  same  visual  model  but  increase  the 
power  of  the  noise  (not  changing  its  power  spectral  shape)  in  such  a 
way  that 

(2.5)  SB*  (f)  - K* S«(f ) 

and  retaining  In  both  experiments  the  same  signal  to  noise  ratio. 
Equation  (2.1)  yields 


— ... 
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(2.G)  n'  m Ku 

and  after  some  simple  algebra  we  find 

(2.7)  A’  (f)  - K'A(f) 

This  means  that  for  a given  signal  to  noise  ratio  an  increase  in  the 
noise  level  amounts  to  corresponding  amplification  in  the  system  A. 
Substituting  equation  (2.5)  and  (2.7)  into  (1.2)  reveals  that 

(2.8)  B’ (f)  - B(f)/K 

Ue  conclude  that  the  increase  in  noise  level  does  not  affect  the 
•hapt  (or  performance)  of  the  entire  system  except  for  a constant 
amp  1 1 f 1 cat  I on. 

In  the  experiments  conducted  in  this  research  the  ensemble  of 
images  was  scaled  in  such  a way  that  D(x,y)  (in  fig  (1.5)]  ranges 
between  0 and  511.  The  noise  was  chosen  to  range  between  -51  and 
4-51  (i.e.  peak  value  of  the  noise  is  10%  of  the  signal).  Ue  have 
just  shown  that  the  result,  for  which  n > will  not  change  with  a 
change  in  the  normalization  factor. 

Wtf  ~CeTrT~  prdvTd£“al  this  poTnt  some  exp  I ana  t i on"  to  the  results 

encountered  by  Stockham  when  using  the  taper-off  visual  model. 
Stockham  used,  as  mentioned  before  V ( f ) and  V*l(f)  as  the  A and  B 
filters.  Thus  the  preprocess i ng  in  his  experiment  yielded  results 
close  to  those  obtained  by  our  system.  The  postprocessing  however. 
Is  quite  different  since  there  is  no  specific  noise  handling 
mechanism  in  Stockham' s scheme  and  it  therefore  produced  completely 
different  results.  The  problem  was  that  the  tapered  portion  of  the 
visual  model  amplified  parts  of  the  noise  which  made  the  entire 
image  bad  looking.  It  turns  out  that  the  system  B is  not  less 


t 


I 
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important  then  the  eastern  A and  in  fact  mag  be  more  important  in 
some  cases  since  it  processes  not  onlg  the  signal  but  also  the 

noise.  It  is  important  at  least  to  have  the  sgetems  A and  B be  a 
matching  pair  in  the  Uienner  sense..  Looking  at  B(f)  for  the 

varloue  egeteme  (figs  (1.10))  reveals  that  the  B filter  looke  more 
like  the  Inverse  of  the  visual  model  of  fig  (1.8b)  than  that  of 
fig  (1.8c).  Ulhat  happenred  in  these  experiments  is  that  the  inverse 
of  the  visual  model  of  fig  (1.8c)  enhanced  high  noise  frequencies 
that  irritated  the  human  observer.  Those  frequencies  should  have 
been  suppressed  since  in  this  domain  the  power  of  the  noise  may 

exceed  that  of  the  image  to  such  an  extent  that  the  image  le 

dominated  by  noise.  A better  system  that  should  have  been  suggested 
is  one  that  uses  the  eye  model  as  the  system  A and  its  matching 
Uienner  filter  as  system  B.  At  any  rate  this  explanation  is  now  of 
minor  Importance  since  it  is  proved  In  this  work  that  for  optimum 
results  the  system  A should  not  be  made  equal  to  the  visual  model. 

In  general,  when  using  the  minimum  least  squares  criterion  for 
optimization,  one  should  remember,  as  noticed  by  Costas  (1)  and 
others  that  the  importance  lies  in  making  the  pre  and  postfilters  a 
matched  pair,  and  that  the  prefilter  have  the  desired  general  shape. 
The  fine  structure  of  the  filter  will  usually  not  improve  the 
performance  of  the  entire  system  significantly. 

To  demonstrate  the  effect  of  the  signal  to  noise  ratio  on  the 
reeulte  the  following  experiment  was  conducted.  The  Image  of 

fig  (1.6)  was  processed  (see  fig  (2.3))  using  MmMc>  Using  Me  yields 
some  signal  to  noise  ratio  that  we  term  Pe.  (For  the  "MILL"  and  the 
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"taper-off"  visual  model  Pc  was  calculated  to  be  14.5).  The  same 
image  uas  also  processed  using  P-2/3Pe  and  P-1.5Pe.  The  A filter 
for  P-1.5Pe  is  essentially  a scaled  version  of  the  A filter  for  Pe. 
Not  so  in  the  case  of  2/3Pc  where  the  A (and  therefore  also  the  B) 
filter  which  resulted  looked  essentially  as  for  the  Pe  case  except 
that  some  high  frequencies  (both  fx  and  fr ) where  zeroed  out.  The 
result  of  the  processing  with  these  signal  to  noise  ratios  is  given 
in  fig  (2.3).  Although  fig.  (2.3b)  is  undoubtedly  better  then 
fig.  (2.3a)  the  latter  is  strikingly  good  in  the  preservation  of 
details.  In  fig.  (2.3c)  we  used  a ratio  of  P-1  which  is  much  below 
tha  critical  power.  As  can  be  seen  the  picture  Is  much  noisier, 
dynamic  range  has  decreased  but  in  spite  of  this  low  signal  to  noise 
ratio  most  of  the  detail,  even  the  fine  detail,  is  well  preserved 
and  demonstrates  the  capability  of  the  system  to  transmit  images 
under  extremely  noisy  circumstances. 

The  improvement  one  gets  when  using  our  system  over  regular 
transmission  is  demonstrated  by  compar i ng  fig.*  (2.3)  w i th 

figs.  (5.9a)  and  (5.10a).  These  latter  pictures  are  statistically 
equivalent  to  what  would  happen  if  the  original  image  had  been 
transmitted  through  the  same  noisy  channel  uithout  the  processing 
done  by  the  pre  and  postprocessors. 

The  choice  of  what  ratio  to  use  depends  upon  the  subjective 
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distortion  one  would  tolerate.  If  detail  is  important  one  might 
reduce  the  signal  to  noise  ratio  and  if  "cleanliness"  of  the 
resultant  picture  is  important  a high  ratio  is  necessary. 


The  re9ult9  obtained  by  calculating  the  A and  B filters  as 
described  in  chapter  one  may  shed  some  light  on  the  mechanism  of 
human  v i s i on 


Although,  as  mentioned  before,  the  criterion  of  mean  square 
error  is  not  aluays  appropriate  for  image  processing,  and  some  of 
the  results  ua  obtained  are  strongly  dependent  upon  to  the 
optimization  criterion,  it  still  seems  that  some  consideration 
should  be  given  to  explain  the  results  on  a more  qualitative  basis, 
and  to  try  and  suggest  some  neu  points  of  view  about  the  nature  of 

human  vision. ___ 

Looking  at  equation  (1.1)  there  are  two  major  components 
Influencing  the  design  of  the  system  A,  namely  the  visual  model  V(f) 
and  the  quantity  Sr.^  . if  the  human  vision  were  equally  sensitive 


to  spatial  frequencies  or  alternately  if  we  were  not  to  consider  the 
visual  system  at  all  in  the  process  of  optimizing  tha  prs  and  post 
processors  we  would  get  results  that  are  worth  some  special 
consideration.  Quantitatively  this  means  replacing  V(f)  in 
equation  (1.1)  with 

(3.1)  V ( f ) - 1 for  all  frequencies. 
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An  isometric  plot  of  A ( f ) resulting  from  such  substitution  is 
shown  in  fig  (3.1a).  Comparing  the  result  with  the  one  derived  for 
a true  visual  model  (fig  (1.9)  ) reveals  that  the  shape  of  A(f)  has 
not  changed  much.  This  implies  that  the  shape  of  A ( f ) is  controlled 
mainly  by  the  shape  of  the  quantity  s<^d  , This  suggests  that  the 
system  is  more  sensitive  to  the  information  itself  (implicitly 
represented  by  Sd)  and  to  the  channel  (character i zed  by  S»)  than  to 
the  mechanism  of  vision.  In  this  case,  where  V ( f ) is  absent  from 
the  system,  the  A filter  that  results  looks  very  much  like  some 
models  of  the  visual  system  that  are  suggested  in  the  literature. 
This  fact  may  provide  us  with  some  explanation  on  the  behaviour  of 
human  vision. 

Many  assumptions  have  been  made  about  the  nature  of 
information  transmission  from  the  eyeball  to  the  brain.  The  one 
that  seems  very  logical  is  that  the  optic  nerve,  connecting  the 
eyeball  to  the  brain,  acts  like  a noisy  channel  (i.e.  that 
information  is  processed  in  tra  eyeball  and  transmitted  to  the  brain 
through  the  noisy  optic  nerve).  If  we  assume  this  kind  of  visual 
system  it  Is  easy  to  model  such  a system.  In  fact  the  system  we 
suggested  in  chapter  one  becomes  a good  model  for  the  entire  visual 
system.  Specifically  the  preprocessor  corresponds  to  the  eyeball  in 
which  initial  processing  is  performed.  The  processed  signal  is  then 
transmitted  through  the  noisy  channel  namely  the  noise  optic  nerve. 
The  brain,  at  the  other  end  of  the  channel  is  the  postprocessor  but 
since  there  is  no  "visual  system"  at  the  other  end  as  in  the  eyetem 
described  in  chapter  one  and  fig  (1.3),  the  substitution  suggested 


32 


I 


In  (3.1)  is  appropriate.  The  result  is  the  system  A shown  in 
fig  (3.1).  Ue  noticed  before  the  resemblence  betueen  the  A filter 
and  the  visual  model,  which  strengthens  the  assumption  that  part  of 
the  eyeball’s  processing  of  images  is  directed  towards  a safer 
transmission  through  the  optic  nerve  much  like  the  task  of  the 
preprocessor  in  our  system.  The  fact  that  A(f)  is  not  circularly 
symmetric  strengthen  the  observation  that  the  human  visual  mechanism 
is  not  quite  isotropic. 

Early  experiments  in  image  enhancement  showed  that  a highpase 
filter  usually  reveals  or  enhances  details  that  could  not  be  clearly 
seen  In  the  original  image.  Apart  for  the  reduction  In  dynamic 
range  due  to  highpass  filtering  of  the  log  illumination  the 
following  arguments  were  made.  An  image  'is  composed  of  two 
components,  illuminaton  and  reflectance  that  are  multiplied  to  form 
the  image.  It  is  the  reflectance  that  we  are  interested  in  since  it 
describes  the  objects  we  are  looking  at.  The  illumination  has  a 
strong  dependence  on  light  sources,  which  are  slowly  varying  across 
the  scene,  and  has  therefore  most  of  its  energy  in  low  frequencies. 
The  reflectance,  on  the  other  hand,  has  considerable  amounts  of 
energy  in  high  frequencies  since  it  "represents"  textures,  sharp 
edges  of  objects,  and  fine  detail.  Unfortunately  illumination  and 
reflectance  cannot  be  absolutely  separated  because  illumination  has 
some  energy  in  high  frequencies  and  reflectance  has  3ome  important 
low  frequency  components.  This  confirms  that  the  separation  between 
illumination  and  reflectance  cannot  be  made  by  a pure  highpase 
filter  but  only  optimally  approximated  by  a filter  that  would  be 


Illumination  component,  is  the  one  we  want  to  get  rid  of,  we  realize 
that  some  sort  of  highpa3s  filtering  on  Q(x,y)  is  necessary.  (This 
is  in  short  the  principle  of  homomorphic  image  enhancement  as 
originally  stated  by  Oppenheim  et  a I [7]).  This  analysis  provides 
us  with  the  physical  justification  for  the  use  of  the  logarithm  as 
the  nonlinear  portion  of  ths  visual  system, 

The  previous  arguments  pose  some  interesting  questions.  He 
note  that  the  visual  system  is  a kind  of  highpass  filter.  Ule  have 
shown  that  highpass  filtering  achieves  best  transmission  of  images 
through  the  optic  nerve  and  we  have  also  shown  that  a highpass 
(homomorphic)  filter  achieves  good  separation  between  illumination 
and  reflectance  that  is  desired  for  good  quality  vision.  This 
suggests  that  the  visual  mechanism  achieves  two  goals:  it  separates 
reflectance  from  illumination  and  encodes  the  image  to  be 
transmitted  in  a safer  way  through  the  optic  nerve.  The  question  Is 
whether  or  not  these  are  really  two  separate  issues  or  two  aepecte 
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of  the  same  phenomenon. 

At  the  time  of  this  writing  there  is  no  unique  decisive  answer 
to  this  question.  It  is  the  author’s  belief  that  the  true  situation 
Is  some  kind  of  a compromise  between  the  two  arguments  for  the 
reasons  outlined  in  the  next  paragraph. 

If  the  eye  system  were  based  only  on  safe  transmission 
cons i derat i on  we  should  have  ended  up  with  an  eye  model  that  obeys 
V (f ) - (S"/Sd)1/J 

since  in  this  case  the  signal,  after  passing  through  filter  A would 
have  the  same  statistics  and  therefore  optimally  immune  to  the  noise 
of  the  optic  nerve.  An  experiment  that  was  conducted  using  that 
kind  of  visual  model  procuced  very  bad  looking  pictures.  The  reason 
that  the  results  were  oad  looking  is  that  such  a visual  model 
suppresses  low  frequencies  too  much  (thus  getting  rid  of  some  needed 
reflectance  component)  and  does  not  work  right  for  high  frequencies 
either,  because  of  the  high  amplification  it  introduces  in  this 
range.  This  suggests  that  the  mechanism  of  the  vision  is  not  based 
upon  transmission  consideration  alone. 

The  fact  that  the  visual  system  is  not  based  solely  on  the 
separation  of  reflectance  and  illumination  can  be  proved  by  the  fact 
that  for  such  a separation  we  need  a highpass  filter  (like  that  of 
fig  (1.8b))  whereas  psyc-iophysio logical  data  ahoue  that  the  visual 
system  is  not  really  highpass  but  rather  a sort  of  bandpass  i.e. 
tapers  off  at  the  very  high  frequencies.  (this  tapering  off  is  not 
due  to  the  finite  aperture  of  the  eyeba  I i lens,  that  should  have 


occured  at  a much  higher  frequency  but  because  of  some  i mper f ertness 


in  the  lens  system  and  retinal  imaging  surface). 

It  is  therefore  the  auther’s  belief  that  human  vision  is  a 
compromise  between  safe  transmission  of  the  image  through  the 
(no i sy)  optic  nerve  and  the  desired  separation  of  illumination  and 
reflectance.  Because  of  the  importance  of  the  Issue  It  eeeme 
worthwi le  for  future  researchers  to  focus  more  attention  in  trying 
to  solve  this  question  in  order  to  acquire  better  understanding  of 
human  vision  and  its  mathematical  modelling. 


I 
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CHAPTER  FOUR 

THE  SYSTEM  ON  A DIGITAL  COMPUTER 


The  system  described  and  analized  in  the  previous  chapters  and 
the  equations  derived  in  appendix  A are  all  true  for  continuous 
functions.  Because  of  the  isomorphism  established  between  the 
continuous  and  the  sampled  functions  C9]  those  results  are  true  also 
for  the  sampled  case.  Note  that  this  isomorphism  relates  the 
cont i nuous-8pace  function  to  the  discrete-space  function  (i.e. 
functions  of  a contlnuoue  or  discrete  domain)  and  does  not  relate  to 
the  question  of  whether  the  range  of  the  function  Is  contlnuoue  or 
discrete.  Ue  shall  assume  that  the  imaqe  in  the  computer  is  sampled 
fine  enough  and  will  serve  as  our  original,  ar.d  that  the  equations 
apply  to  this  original  image.  In  this  chapter  we  shall  outline  the 
process  of  implementing  the  system  on  a digital  computer.  The 

chapter  will  not  deal  with  problems  that  arise  because  of  the 

1 1 

facilities  (and  their  hardware  limitations)  but  rather  with  some 
theoretical  problems. 

The  first  problem  is  the  one  of  estimating  the  power  spectrum 
of  the  image  ensemble.  It  ie  obvious  that  the  true  power  epectrum 


Is  not  available  to  us  and  we  have  to  be  satisfied  with  some 


c I ose 


est i mate. 


The  algorithm  used,  based  on  averaging  per i odograms,  was 
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suggested  (for  one  dimension)  by  Welch [133 . The  picture  ie  broken 
up  Into  (poesibly  overlapping)  sections.  Each  section  ie  wlndoued 
(we  used  a two  dimensional  Hanning  window)  and  then  the  magnitude 
square  of  the  discrete  Fourier  transform  of  that  section  ie 
calculated.  This  quantity  is  then  averaged  over  all  sections.  The 
final  result  is  then  divided  by  the  energy  in  the  window.  An 
example  of  the  power  spectrum  estimate  (on  a db  scale)  of  the 

logarithm  of  an  image  is  given  in  fig  (l.G)  and  fig  (1.7). 

The  accuracy  of  such  an  estimate  is  discussed  in  Welch  E13)  . 
Using  the  above  algorithm  we  may  take  one  of  three  approaches.  The 
first  uses  the  estimate  taken  from  one  image  as  the  eetimated  power 
spectrum  of  the  entire  ensemble.  The  second  approach  averages 
estimates  over  a collection  of  images.  This  author’s  experience 
shows  that  four  or  five  carefully  selected  images  suffice  for  such 
an  estimate.  The  last  approach  is  to  divide  the  ensemble  into 
sub-ensemb I es  each  of  which  has  a power  spectral  estimate,  and  then 
using  one  of  those  estimates  for  the  given  image.  (The 

sub-ensemb I es  should  depend  heavily  on  the  amount  of  man-made 
objects  in  the  images  since  this  is  the  main  contributor  to  the 
differences  In  the  power  spectra).  The  third  approach  seams  to  be 
the  best  one.  However,  it  is  difficult  to  automate  because  of  some 

unsolved  problems  in  image  classification.  In  general  the  results 

are  not  very  heavily  dependent  upon  the  kind  of  algorithm  used.  The 
image  of  fig  (1.6)  was  processed  by  filters  designed  using  the  power 
spectrum  estimate  of  the  image  of  fig  (1.7)  and  the  results  are 
given  in  fig  (4.1).  [compare  with  the  image  of  fig  (2.1c)]. 
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Piq  4.1  Cross  filterinq  (usinq  P ) 

c 

(a)  MILL  processed  with  filters  of  BARN 

(b)  BARN  processed  with  filters  of  MILL 
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Another  Important  problem  is  that  of  windowing.  By  windowing 
we  refer  to  the  process  of  modifying  the  impulse  response  of  a 
system  in  such  a way  that  its  frequency  response  will  change  only 
8 1 i ght I y . The  need  for  windowing  arises  mainly  because  we  are 
dealing  with  an  infinitely  long  impulse  response  that  has  to  be 
truncated  for  the  implementation  on  a (finite  memory)  digital 
computer.  Our  windowing  process  was  chosen  to  be  the  multiplication 
of  the  impulse  response  of  the  system  with  a bell  shaped  function 
whose  c i rcumf erenc i a I points  are  zero.  Undoubtedly  no  windowing  at 
ail  yields  bad  results  but  on  the  other  hand  when  a(x,y)  is  windowed 
It  Is  no  more  the  optimal  filter.  One  has  to  learn,  when  using 
digital  computers,  to  compromise  optimality  of  the  eolution  for  the 

I 

uee  of  this  powerful  device.  A few  experiments  were  made  to  find 
the  beet  way  of  windowing  out  of  the  following  five  possibilities: 

\ 1.  Window  filter  A only. 

2.  Window  filter  B only. 

3.  Window  filter  A and  B (after  the  optimal  filters  have  been 
calculated. 

4.  Window  filter  A.  Calculate  filter  B as  the  Wiener  filter 
of  the  windowed  A. 

5.  Window  ae  In  (4)  but  apply  window  to  filter  B ae  wall 


From  all  these  possibilities  the  one  found  most  satisfactory 
is  the  fifth  method  suggested  above.  Whenever  windowing  was  needed 
thie  method  was  used.  Seme  examples  of  various  windowing  procedures 
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related  to  the  methods  1,3  and  4 above  are  shown  in  fig  (4.2).  The 
window  itself  is  always  a two  dimensional  Hanning  window. 

The  only  way  to  avoid  windowing  (and  its  side  effects)  is  by 
using  periodic  convolution,  as  oppsosed  to  aperiodic  convolution 
which  requires  windowing.  The  periodic  convolution  although  freeing 
us  from  window  worries  introduces  some  other  side  effects.  Hore 
time  is  needed  to  estimate  large-record  power  spectra  (Although  this 
is  a one  time  problem)  and  the  result  converges  more  slowly.  In 
general,  the  results  were  not  of  much  difference  when  both  methods 
had  been  compared.  A slight  preference  was  noticed  for  the  periodic 
convolution.  Compare  fig  (4.3)  which  used  circular  convolution  with 
fig  (2.1c)  and  fig  (2.2c)  where  aperiodic  convolution  was  performed 
with  the  same  set  of  filters. 

Another  theoretical  issue  is  the  sensitivity  of  B(f)  [or 
b(x,y)I  to  the  number  of  bits  used  to  describe  it.  In  general  all 
the  computatuion  was  done  in  standard  floating  point  arithmetic  but 
experiments  were  performed  to  see  what  kind  of  accuracy  is  needed 
for  b(x,y).  The  issue  is  of  some  importance  for  implementing  the 
system  on  a short  word  computer  or  if  attempts  are  made  to  U6e  fixed 
rather  then  floating  point  arithmetic.  The  experiments  show  that 
for  512  by  512  picture  elements  and  filters  that  are  G4  by  G4  the 
use  of  five  bits  per  sample  of  the  filter  B suffices,  four  bits  give 
good  results  but  three  bits  introduce  extra  conspicuous  distortion. 

Us  would  like  to  pause  at  this  point  and  make  some  accounting 
for  the  actual  number  of  bits  to  be  transmitted.  Tuo  additional 
pieces  of  information  have  to  be  added  to  the  actual  data 


43 


transmitted.  The  first  one  is  the  average  of  the  image,  which  is 
usually  not  zero  (in  the  log  domain).  This  is  done  because  some 
coding  schemes  worK  better  on  a zero  average  signal  and  thus  the 
original  average  has  to  be  separately  transmitted.  Experiment  show 
that  this  average  is  i r.  the  range  in  which  9 bits  will  usually 
suffice.  The  second  and  more  important  issue  is  the  B filter.  If 
ua  decide  to  transmit  the  B filter  along  with  the  image  itself  the 
number  of  extra  bits  needed  is  computed  in  the  following.  Ue 
mentioned  before  that  a B filter  with  64X64  elements  quantized  to 
have  5 bits  per  sample  yields  good  results.  Thus  the  total  number 
of  bits  to  be  added  to  the  transmission  is  64X64X5.  If  the  image  is 
a 512X512  this  means  an  addition  of  less  than  0.08  bits  per  picture 
element  transmitted.  Although  this  quantity  is  quite  small  one 
would  like  to  avoid  this  transmission.  One  way  of  doing  it  is 
creating  a "library"  of  B filters.  Ue  have  demonstrated  before  in 
fig  (4.1)  that  the  A and  B filters  need  not  be  matched  to  the  image 
In  question.  Thus  one  can  decide  to  use  A and  B filter  pairs  out  of 
a "standard"  small  library  so  that  the  only  information  to  be  added 
to  the  actual  bits  of  the  image  is  the  index  into  the  library  which 
is  a negligible  addition  to  the  total. 

The  last  item  we  would  like  to  comment  on  is  the  noise. 
Usually  when  talking  about  a channel,  there  is  no  control  over  the 
noise.  In  experimental  work  one  must  choose  the  noise.  Ue 
simulated  the  channel  by  uniformly  distributed  zero  mean  random 
numbers.  Later  on  when  this  schema  is  used  for  image  coding  we 
■hall  Introduce  some  dlfferennt  kinds  of  noise  more  commonly  known 
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CHAPTER  FIVE 

THE  USE  OF  THE  MODEL  FOR  IMAGE  CODING 

Up  to  now  we  considered  the  system  for  transmitting  images  in 
a general  tense.  One  other  very  important  use  of  euch  a eyetem  is 
bandwidth  reduction  or,  how  to  encode  digital  images  with  as  few 
bits  per  picture  element  as  possible  and  such  that  the  closest 
possible  version  of  the  original  may  be  retrieved  . In  theory,  for 
a given  ensemble  of  signals  we  can  define  a distortion  measure  d and 
a rate  distortion  function  R(d)  based  on  this  distortion.  It  can  be 
shown  C2I  that  for  any  distortion  d there  exists  a coding  scheme 
that  will  code  the  image  with  R(d)  bits  per  sample  (or  unit  area,  in 
case  of  images).  Unfortunately  the  creation  of  this  most  efficient 
code  requires  the  Knowledge  of  the  joint  probability  density 
function  of  the  ensemble,  which  is  beyond  our  reach.  Some 
assumptions  have  been  made  about  this  function  but  none  was  close 
enough  to  give  a good  approximation.  Beyond  the  Knowledge  of  the 
source  statistics  one  needs  to  define  a rate-distortion  function. 
The  author  is  unaware  of  any  rate  distortion  function  for  sources 
other  than  Gaussian.  The  direct  consequence  is  that  although  in 
theory  we  can  get  as  low  as  R bite  per  sample  (for  a given 
dietortion)  in  practice  this  i6  unachievable.  The  problem  is  thus 
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to  find  a reasonably  efficient  coding  scheme  that  would  be 
pract I ca I . 

One  of  the  first  schemes  suggested  in  the  image  processing 
context  Is  the  one  introduced  by  Roberts  [8],  The  algorithm  goes  as 
folloue.  To  the  image  that  has  to  be  encoded  we  add  pseudo  random 
noise  of  zero  mean  and  peak  value  K.  The  result  is  then  quantized 
(in  quantization  intervals  of  2K  ) and  truncated  to  yield  the 
desired  number  of  levels.  At  the  other  end  the  pseudo  random  no  i ee 
i 8 subtracted  to  yield  the  final  retrieved  image.  This  subtraction 
introduces  a synchronization  problem  between  the  sending  and 
receiving  stations  which,  if  possible,  should  be  avoided. 

The  system  can  be  viewed  as  a "black-box"  that  produces  the 
output  signal  from  the  input  signal  by  adding  noise,  quantizing, 
truncating  and  subtracting  the  same  noise  again.  This  black  box  can 
be  a I so  modelled  as  a simple  addition  of  noise  (not  the  one  actually 
added)  to  the  truncated  input  signal.  Roberts  investigated  the 
statistics  of  the  output  signal  and  states  that  if  truncation  is 
Ignored  (or  possibly  avoided)  the  equivalent  noise  (i.e.  the 
difference  between  the  output  and  input  signals),  although  different 
from  the  one  actually  added,  has  the  same  distribution.  As  to  the 
second  order  statistics  it  has  to  be  determined  how  much  is  the 
equivalent  noise  correlated  with  the  original  and  what  is  its 
"color". 

Llppel  and  Kurlard  (5)  carried  Roberts'  ideas  one  etep 
further.  They  questioned  the  whole  idea  of  using  pseudo  random 
noise  and  suggested  instead  using  a stylized  pattern  which  is  termed 


"dither".  An  attempt  wae  made  to  find  optimal  patterne  according  to 
some  criteria  and  a few  of  the  results  are  given  in  C51  , one  of 
which  is  used  throughout  this  work.  The  idea  of  the  dither  is  of 
importance  because  of  a partial  solution  it  provides  to  the 
synchronization  problem.  The  pattern  Lippel  and  Kurland  suggested 
consists  of  a (relatively)  small  kernel  that  is  repeated  over  and 
over  again  (the  kernel  used  in  this  work  is  a 4X4).  This  means  that 
the  synchronization  is  limit  ted  to  the  knowledge  of  the  kernel  only 
and  not  to  a big  record  like  in  the  pseudo  random  noise  method. 
Llppsl  and  Kurland  also  noticed  that  using  dither  without  the 
subtraction  at  the  other  end  yield  satisfactory  results.  In  the 
work  done  here  the  use  of  dither  proved  to  be  superior  to  the  use  of 
pseudo  random  noise  especially  in  the  case  where  no  subtraction  was 
performed.  Lippel  and  Kurland,  interested  only  in  the  final 
appearance  of  the  image,  did  not  investigate  the  statistical 
properties  of  the  resultant  image  which  are  obviously  different  from 
those  obtained  by  Roberts’  method. 

A few  experiments  were  conducted  to  find  (empirically  and 
theoretically)  the  color  of  the  equivalent  noise  and  its  correlation 
with  the  input  signal,  especially  when  deterministic  dither  patterne 
are  used.  When  random  white  noise  was  used  the  equivalent  noise  was 
white  even  for  colored  irputs  (for  example  fig.  (1.6b)).  The  use  of 
dither  pattern  is  more  troublesome  in  that  it  is  hard  to  talk  about 
its  statistics,  especially  second  order.  Even  if  one  ssumes  that 
averaging  periodograms  produces  some  information  about  the  spectral 


properties  one  is  faced  with  a more  severe  problem  - the  fact  that 
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deterministic  patterns  are  not  6pace  i nvar i ant  and  thus  mak 1 ng  the 
entire  idea  of  power  spectral  density  invalid.  However,  the  fact 
that  the  power  spectrum  of  the  dither  pattern  Itself  le  never 
calculated,  but  rather  the  power  spectrum  of  the  equivalent  noise, 

I eaves  some  hope  for  good  results.  It  is  expected  though  that  the 
equivalent  noise  be  correlated  with  the  input  signal  even  for  quite 
fine  quantization.  Indesd,  this  fact  was  experimentally  confirmed 
and  the  power  spectrum  of  the  equivalent  noise  is  a mixture  of  the 
power  spectrum  of  the  input  signal  and  the  Fourier  components  of  the 
d i ther  pattern. 

Ue  took  the  MILL,  added  a dither  pattern,  quantized  to 
1 b It/pel  and  then  subtracted  the  dither  pattern  and  the  original  to 
leave  us  with  the  equivalent  noise.  The  power  spectrum  estimate  of 
the  noise  is  shown  in  fig  (5.1a)  and  clearly  demonstrates  that  this 
is  a blend  of  the  image  and  the  dither.  The  power  spectrum  of  the 
Image  ehowe  up  in  the  shape  of  the  peake,  and  the  symmetric 
distribution  of  the  peaks  is  due  to  the  symmetry  that  the  dither  has 
in  the  frequency  domain.  When  quantizing  finer  the  shape  of  this 
power'  spectrum  becomes  more  flat  (i.e.  smaller  max/min  ratio)  but 
retains  the  shape  of  symmetrical  peaks  as  shown  in  fig  (5.1b)  where 
the  same  experiment  was  repeated  with  quantization  of  3 bits/pel. 
If  the  noise  that  is  actually  added  is  white  and  does  not  have  a 
stylized  Fourier  transform  like  dither  patterns,  the  equivalent 
noise  is  uhite  as  shown  in  fig  (5.1c).  In  either  case,  if  the 
original  signal  Is  whits  the  peaks  disappear  and  the  equivalent 
nolee  is  uhite.  Uhen  using  our  system  the  signal  entering  the 
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quantizer  is  preprocessed  and  as  described  before  the  A filter 
whitens  the  signal  thus  making  the  equivalent  noise  white,  which  is 
what  we  assumed  when  designing  the  preprocessor. 

The  amount  of  correlation  between  the  signal  and  the  the 
equivalent  noise  varies  again  depending  upon  what  noise  is  actual  ly 
added.  The  amount  of  corelation  turns  out  to  be  the  smallest  when 
the  input  signal  is  white  which  is  what  happens  when  applying  our 
preprocessor  before  quantization. 

The  two  originals  of  fig  (1.6a)  and  fig  (1.7a)  (with 
9 bite/pel)  where  each  processed  using  1 bit/pel  and  applying  pseudo 
random  noise  or  dither  with  and  without  subtraction  at  the  retrieval 
time.  The  originals  used  were  carefully  selected.  The  first 
picture  is  a very  "busy"  one  with  a lot  of  detail.  The  piece  of  sky 
at  the  top  right  is  of  great  importance  because  all  the  side  effects 
of  the  process  will  be  clearly  seen  there  whereas  they  are  hidden 
(although  existing,  of  course)  in  the  other  busy  sections  of  the 
picture.  The  power  spectrum  of  the  image  (an  estimate  of  which  is 
given  on  a db  scale  In  fig  (1.6b))  has  a lot  of  energy  In  off-axee 
frequencies  because  of  the  huge  amount  of  diagonal  structure  in  the 
original  image.  The  second  picture  differs  quite  a bit  from  the 
firet  in  that  there  are  almost  no  man  made  objects  in  the  scene. 
This  manifests  itself  in  the  fact  that  most  of  the  energy  is 
concentrated  along  the  axes  of  the  power  spectrum  (given  on  a db 
scale  in  fig  (1.7b)). 

To  demonstrate  visually  the  operation  of  the  system  some 
"Internal"  signals  of  the  process  (designed  for  P«PC)  are  shown  in 


ig.  5.2  Internal  signals  (using  P=PC  and  dither) 

(a)  After  preprocessing 

(b)  After  quantizing  to  1 bit/pel 

(c)  After  subtraction  of  dither 
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Internal  signals  (using  P-P  and  random  noise) 


(a)  After  quantizing  to  1 bit/pel 

(b)  After  subtracion  of  random  noise 
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fig.  (5.2).  Fig.  (5.2a)  shows  the  signal  after  preprocessing  where 
the  contour8  have  been  enhanced  and  the  low  frequencies  almost 
removed.  Fig  (5.2b)  shows  the  signal  after  being  quantized  to 
1 bit/pel  (dither  was  added  before  quantization  took  place),  and 
fig.  (5.2c)  shows  the  signal  prior  to  postprocessing  i.e.  after 
dither  was  subtracted.  The  same  internal  signals  when  random  white 


no)  se 

was  used  Instead 

of  dither  are  shown  in  fig.  (5.3). 

The 

experiment  was  repeated 

for  a 

s i gna 1 

to  noise  ratio  of  P-1. 

The 

s i gna 1 

past  the  preprocessor, 

past 

the  quantizer,  and  with 

the 

random 

noise  (which  was 

added 

before 

quantizing)  subtracted. 

are 

shown  in  fig.  (5.4). 

In  figures  (5.5)  and  (5.S)  we  have  four  processed  versions  of 
each  of  the  two  originals  using  the  scheme  just  presented.  In  all 
versions  we  preprocessed  (using  P-Pe)  the  original,  added  noise 
(random  or  a dither  pattern),  and  quantized  to  1 bit/pel.  In 
flge.  (5.5a),  (5.5c),  (5.6a),  and  (5.6c)  we  also  subtracted  the 

noise  that  was  previously  added,  from  the  output  of  the  quantizer. 
All  signals  were  then  postprocessed  by  the  B filter.  The  version 
processed  with  dither  and  with  subtraction  seems  to  be  the  best 
looking  of  all.  In  all  images  details  are  very  well  preserved. 
Enhancement  done  by  the  author  on  the  shadowed  region  shows  that 
details  in  that  area,  although  hardly  visible  in  the  original,  are 
all  present  in  the  processed  versions.  Note  that  the  version 
processed  with  pseudo  random  noise  looks  a little  "dirty"  especially 
In  the  upper  right  section  of  the  picture.  In  that  domain,  in  the 
version  processed  with  dither,  the  careful  observer  will  find  a 
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Fiq.  5.5  MILL  processed  with  1 bit/pel  and  P=P  =14.5  . 

Preprocessed,  noise  added,  quantized,  and  postprocessed 
usinq : 

(a)  Pseudo  random  noise  with  subtraction 

(b)  Pseudo  random  noise  without  subtraction 

(c)  Dither  pattern  with  subtraction 

(d)  Dither  pattern  without  subtraction 
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BARN  processed  with  1 bit/pel  and  P=P  =14.5 


Preprocessed,  noise  added,  quantized,  and  postprocessed 
using : 

(a)  Pseudo  random  noise  with  subtraction 

(b)  Pseudo  random  noise  without  subtraction 

(c)  Dither  pattern  with  subtraction 

(d)  Dither  pattern  without  subtraction 
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dither  pattern  eomewhat  ambaddad  in  tha  background.  Tha  email 
difference  between  the  results  using  dither  with  and  without 
subtraction  suggests  of  course  the  use  of  the  method  without  the 
subtraction  of  the  dither. 

The  reason  that  subtraction  in  the  case  of  dither  improves  the 
final  Image  only  slightly  is  the  fact  that  the  dither  pattern  was 
chosen  such  that  its  energy  is  entirely  concentrated  in  the  very 
high  frequencies  which  have  the  lowest  visibility.  In  those 
frequencies  the  B filter  is  attenuating  the  signal  considerably  thus 
reducing  the  influence  of  the  subtracted  dither  which  ae  Indicated 
before  have  low  visibility  to  start  with.  In  the  case  of  pseudo 
random  noise  the  energy  is  distributed  among  all  frequencies  and  the 
low-frequency  component  will  get  through  the  B filter  and  show  up  in 
the  final  image. 

In  the  original  Roberts’  scheme  the  subtraction  of  the  noise 
plays  an  important  role,  whereas  in  our  scheme  it  ■seems  to  have  a 
much  smaller  effect.  This  big  difference  lies  in  the  fact  that  we 
have  the  postprocessor  at  our  disposal  to  reduce  the  effect  of  the 
subtracted  noise.  Following  the  path  this  noise  goes  through,  we 
find  that  it  Is  first  lowpassed  by  the  B filter  to  leave  mainly  low 
frequencies.  Next,  when  viewing  the  picture  with  a human  eye,  we 
pass  the  noise  through  a filter  uhich  suppresses  low  frequencies  and 
thus  reducing  the  effect  of  the  components  left  by  the 
postprocessor. 


The  dither  pattern  seen  on  these  images  is  not  the  one 
subtracted  but  is  a pattern  created  by  the  process  from  the  dither 
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added  to  the  signal  before  being  quantized.  Attempts  to  “tailor" 
the  B filter  to  reduce  the  visibility  of  this  pattern  failed,  mainly 
because  the  equivalent  noise  in  this  rough  quantization  was  too 
correlated  with  the  signal.  In  finer  quantization  this  pattern  has 
much  less  effect. 

The  result  using  our  coding  scheme  with  much  lower  S/N  ratios 
are  demonstrated  in  fig.  (5.7).  Here  we  have  processed  the  MILL 
with  1 bit/pel  using  a signal  to  noise  ratio  of  2 in  figs.  (5.7a) 
and  (5.7b)  with  dither  and  random  noise  respectively,  and  a ratio  of 
1 in  fig.  (5.7c)  and  (5.7d)  with  dither  and  random  noise 
respectively.  The  observer  will  notice  immediatelly  the  presence  of 
a dither  pattern  (or  noise)  all  over  the  scene  even  in  busy  regions. 

I 

The  detail,  though,  is  preserved  fa--  better  than  in  the  straight 
Roberts  method. 

■ 

The  reduction  of  the  S/N  ratio  implies  in  our  case  the 
reduction  of  the  entropy,  which  when  applying  a source  coding  scheme 
plays  an  important  role  in  the  number  of  bits/pel  needed.  When 
using  the  quantization  method  the  way  we  did,  this  ratio  is  much 

i « 

less  important  since  the  quantizer  cannot  go  below  1 bit/pel.  It 
should,  however,  be  chosen  to  match  the  "Roberts  Box"  with  the  rest 
of  the  system  as  explained  later  in  this  chapter.  Reduction  of  the 
S/N  ratio  can  be  achieved  by  either  increasing  the  noise  level  or 
decreasing  the  total  sigral  power.  The  reduction  in  signal  power  is 
not  done  by  merely  attenuating  the  signal  but  also  by  changes  in  the 
shape  of  the  filter  (to  retain  the  optimality  of  the  design)  which 
may  possibly  Inhibit  some  of  the  frequencies  from  being  transmitted 

h...  . _ _ 


coded  at  1 bit/pel  and  low  S/N  ratios  with  subtraction 

P -2  and  dither 

P=2  and  random  noise 

P=1  and  dither 

P=1  and  random  noise 


K-  i 

f| 

1 

■ 

i fj. 

iTjpr  ^ 

\ t 

k.  > 

60 

and  thus  causing  the  resultant  image  to  be  more  blurred.  As  results 
indicate,  when  using  the  quantizer  method  (rather  than  source 
coding)  it  is  not  advisable  to  reduce  the  S/N  ratio  to  such  I on 
levels  but  rather  choose  some  intermediate  value. 

A very  important  issue  in  Roberts’  method  is  the  truncation. 
Roberts  assumed  that  the  entire  range  is  devided  into  the  desired 
number  of  levels,  and  the  truncation  occurs  when  the  addition  of 
nolle  ciuiii  overflow  of  thi*  range.  In  our  casi  the  no  I si  is  glvin 
In  advance  and  thus  the  quantization  levels  are  predetermined.  In 
the  optimum  case  the  preprocessor  should  take  precaution  to 
compensate  for  (or  avoid)  truncation.  However  this  is  a nonlinear 
operation  and  the  A filter  is  not  optimized  for  this  truncation. 
Moreover,  the  fact  that  A is  designed  for  an  average  image  may  cause 
more  than  average  truncation  for  some  members  of  the  ensemble.  Ue 
can  demonstrate  the  effect  of  truncation  by  the  following 
experiments.  Ue  let  the  quantizer  have  infinitely  many  levels  (i.e. 
no  truncation)  but  otherwise  leave  the  process  unchanged.  The 

result  of  this  experiment  is  demonstrated  in  fig  (5.8a)  and  should 
be  (ensemble  wise)  equivalent  to  fig.  (2.1c)  since  the  Roberts 
process  without  truncation  is  equivalent  to  the  addition  of  noise. 
Alternately  one  can  simulate  the  Roberts  process  by  inserting  a 
"ciipper"  before  random  noise  is  added.  This  is  shown  in 
fig.  (5.8b)  which  should  be  (ensemble  wise)  equivalent  to 

fig.  (5.5a).  Thus  the  differences  between  fig.  (2.1c)  and  (5.5a)  or 
between  figs.  (5.7d)  and  (2.3c)  are  all  due  to  truncation.  The 


flare  that  is  seen  near  some  of  the  contours  is  due  to  severe 


Truncation  experiment 

(a)  Using  random  noise,  quantizing  without  truncation, 
and  subtraction  of  noise 

(b)  Using  random  noise  with  clipping  (without  quantizing 
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truncation  In  those  regions. 

As  mentioned  before  there  are  two  signal  to  noise  measures  to 
be  considered  and  in  some  way  to  be  matched.  The  first  quantity  P 
(or  alternately  //)  mentioned  in  chapter  one,  effects  the  design  of 
the  filters.  The  second  is  a signal  to  noise  ratio  implied  by  the 
Roberts  box  and  the  truncation  that  takes  place  inside  it.  Those 
two  quantities  need  not  necessarily  be  the  same  but  one  ought  to 
choose  P (or  //)  to  yield  a signal  that  will  suffer  least 
truncation.  It  is  up  to  the  user  to  decide  whether  fig.  (5.7)  (with 
very  little  truncation)  or  fig.  (5.5)  (with  more  truncation  but 
better  dither  suppression)  is  more  to  his  liking. 

Some  comparision  between  the  results  using  Roberts’  method  and 
the  system  we  suggested  will  be  outlined  here.  In  figs.  (5.9)  and 
(5.10)  we  have  processed  the  MILL  by  Roberts’  method  using  1 and  2 
blts/pel,  applying  pseudo  random  noise  and  a dither  pattern  and  in 
each  case  we  demonstrate  the  results  with  and  without  subtraction, 
l.e.  the  way  figs.  (5.5)  and  (5.6)  were  produced  except  without  pre 
and  pos t filtering.  All  the  versions  of  fig.  (5.9)  look  quite  dull 
and  a lot  of  information  is  gone.  The  pictures  of  fig.  (5.10)  look 
much  better  and  the  user  whose  objective  is  to  produce  sharp  looking 
images  will  probably  prefer  this  kind  of  image.  However,  some 
information  is  lost  due  to  this  kind  of  processing.  Any  observer 
will  note  immediatelly  the  presence  of  the  dither  pattern  all  over 
the  image.  Its  existence  is  seen  not  only  in  the  upper  right  corner 
but  rather  all  over  the  scene.  Some  information  appearing  on  the 
original  is  lost.  The  careful  observer  will  note  that  the 


MILL  processed  with  1 bit/pel  without  filtering  and 

(a)  Pseudo  random  noise  with  subtraction 

(b)  Pseudo  random  noise  without  subtraction 

(c)  Dither  i attern  with  subtraction 

(d)  Dither  ['attern  without  subtraction 


MILL  processed  with  2 bit/pel  without  filtering  and 

(a)  Pseudo  random  noise  with  subtraction 

(b)  Pseudo  random  noise  without  subtraction 

(c)  Dither  pattern  with  subtraction 

(d)  Dither  pattern  without  subtraction 
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electrical  wires  that  go  accross  the  sky  in  the  upper  right  corner 
have  disappeared  and  also  some  detail  in  the  background,  seen 
through  the  building,  is  gone. 

The  images  processed  with  our  system  (we  refer,  for  example, 
to  fig.  (5.5c))  tend  to  be  more  smeared  and  hazy.  The  total 

appearance  of  the  image  is  not  so  sharp  (remember,  though,  that  fig. 
(5.5c)  is  processed  with  1 bit/pel)  but  information  is  preserved  far 
battar.  Nota  for  example  that  the  dither  pattern  la  much  leea 

conspicuous.  This  stems  from  the  fact  the  the  B filter  acts  as  a 
blurring  system  on  the  dither  and  considerably  attenuates  the 
frequencies  in  which  the  dither  has  most  of  its  energy.  The  pattern 
that  is  still  seen  on  the  picture  results  from  the  dither  added 

before  the  quantization  and  not  the  one  that  was  subtracted  later. 
The  electrical  uires  mentioned  before  are  clearly  seen  in  this 

i mage. 

Ue  can  summarize  the  comparision  by  stating  that  the  Roberts 
method  creates  (for  2 and  more  bit/pel,  but  not  for  1 bit/pel) 
sharper  looking  images  but  on  the  other  hand  tend  to  omit  some  of 
the  fine  detail  and  makeB  the  pseudo  random  noise  or  dither  pattern 
more  prominent.  The  system  we  suggest  produces  pictures  that  look 
more  hazy  but  which  preserve  more  detail.  The  dither  (or  any  other 
noise  pattern)  in  our  method  is  much  less  conspicuous  than  in  the 
regular  Roberts  method. 

To  demonstrate  the  power  of  this  method  and  the  improvement 
one  gets  when  more  than  1 bit/pel  is  used  we  have  processed  the 
"MILL"  using  dither  and  cuantizing  to  2 and  3 blte/pel.  The  results 


MILL  processed  with  more  than  1 bit/pel 
dither  pattern  with  subtraction 

(a)  2 bit/pel 

(b)  3 bit/pel 


are  given  In  fig  (5.11).  Compare  figs.  (5.9)  and  (5.10)  (that  uee 
Roberts’  method  with  no  prefiltering)  with  fig.  (5.11)  to  verify  the 
Improvement  one  gets  when  using  our  method  over  the  regular  Roberts 
method  or  the  dither  method. 

Note  how  close  these  versions  are  to  the  original  and  how  with 
3 bite  per  picture  element  most  of  the  flare,  typical  to  this 
proceesing,  is  almost  gone  since  less  truncation  tcok  place. 
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CHAPTER  SIX 
CONCLUSIONS 

This  work  attempted  to  incorporate  some  properties  of  human 
vision  into  the  processing  of  images  specifically  for  image 
transmission  and  bandwidth  reduction. 

The  incorporation  of  a model  of  human  vision  seemed  to  be 
important  since  the  goal  of  any  image  processing  is  to  produce  an 
image  to  be  viewed  through  this  mechanism.  The  system  we  suggested 
includes  a preprocessor  the  output  of  which  is  transmitted  through  a 
noisy  channel  whose  statistics  is  assumed  to  be  known  and  a 
postprocessor  that  undoes  the  preprocessing  taking  into  account  the 
disturbances  that  might  have  occurred  in  the  transmission  process. 
Our  goal  in  bandwidth  reduction  was  to  employ  a coding  scheme  that 
is  independent  of  the  source  so  that  the  source  statistics  is  not 
required  since  it  is  usually  not  available. 

By  employing  the  least  squares  criterion  for  optimization  we 
derived  the  optimal  p re  and  post  processors  and  investigated  the 
sensitivity  of  the  resultant  system  to  the  visual  model  used.  As  it 
turns  out  there  is  very  little  dependence  of  the  system  on  the 
visual  model  and  a much  havler  dependence  on  the  statistics  of  the 
channel  and  the  ensemble  of  images. 


/ 
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An  important  point  to  consider  is  the  optimization  criterion 
itsslf.  The  minimum  leant  squares  criterion  le  not  a I wage  adequate 
for  Image  processing.  Some  of  the  results  obtained,  depend  on  the 
criterion  chosen.  Unfortunately  there  are  not  very  many  criteria 
whose  characteristics  have  been  thoroughly  investigated  and  whose 
mathematical  properties  are  in  some  way  tractable.  One  of  the  most 
Important  consequences  of  using  this  criterion  ie  the  fact  that  the 
entire  system  is  not  very  sensitive  to  the  fine  structure  of  the 
filters.  As  long  as  the  filters  are  a matching  pair  and  the 
prefilter  follows  the  general  outlines  desired,  the  performance  of 
the  system  will  be  satisfactory.  The  fine  structure  of  the 
prefilter  will  usually  not  Improve  the  performance  of  the  system 
significantly.  This  point  has  been  noticed  before  by  others  and  ie 
emphasized  in  this  work  by  comparing  the  processed  versions  of  the 
same  original  with  two  different  sets  of  filters. 

The  attempt  to  employ  a coding  scheme  has  been  shown  to  be 
worthwhile,  Us  have  used  Roberts'  method  for  quantization  and  the 
result,  going  down  to  1 bit  per  picture  element  are  better  than 
anticipated.  Comparision  of  images  coded  by  Roberts'  method  and 
images  coded  by  our  scheme  show  quite  an  improvement  in  the  numbers 
of  bi ts/psl  needed. 

Comparision  of  the  system  suggested  here  with  the  one 
suggested  by  flannos  and  Sakrison  calls  for  more  detail.  flannos  and 
Sakrison’s  work  uses  a source  coding  scheme  and  therefore  may  yield 


results  with  an  average  cf  less  than  1 bit  per  picture  element  which 
Is  the  lower  bound  of  our  scheme.  Mannos  and  Sakri  eon's  work  sets 
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an  upper  bound  on  the  number  of  bits  per  sample  needed  to  code  that 
source,  but  unfortunately  the  code  requires  the  knowledge  of  the 
source  statistics  which  is  unavailable.  Our  scheme,  though 
requiring  more  bite  for  the  coda  (per  sample)  than  the  bound  set  by 
Hannoe  and  Sakrieon,  is  very  easy  to  implement  and  produces 
satisfactory  results. 

In  view  of  the  previous  discussion  it  is  suggested  to  merge 
the  two  schemes.  Some  of  the  equations  derived  In  the  two  works 
bear  a lot  of  resemblence  which  emphasizes  their  closeness.  As  of 
this  uriting  there  was  no  measurement  made  to  compare  the  results  of 
Mannos  and  Sakrison  with  those  obtained  in  this  work  on  a 
quantitative  basis.  It  is  suggested  to  apply  Mannos  and  Sakrison’s 
work  to  the  optimal  system  derived  here.  This  may  relate  the  signal 
to  noise  ratio  to  a distortion  measure  and  will  yield  better 
understanding  on  a quantitative  basis,  of  how  good  this  coding 
scheme  does  compared  to  the  theoretical  bound. 

For  the  purpose  of  coding,  it  is  not  obvious  that  th*  random 
noise  or  dither  pattern  used  produce  beet  results.  A different 
pattern  may  improve  the  performance.  Moreover,  a special 
investigation  of  the  quantization  levels  is  required.  We  chose  the 
quantization  level  to  be  10%  of  the  maximum  value  of  the  original 
image  in  question.  The  choice  is  arbitrary  and  more  experiment  and 
research  are  needed  to  establish  firm  rules. 


As  to  the  comments  on  the  visual  system,  very  few  have  tried 
to  analyze  the  visual  system  from  a physical  point  of  view.  Most  of 
the  explanations,  as  should  be,  are  results  of  psychophysical 
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experiments.  In  this  author’s  opinion  those  psychophysical  results 
should  be  analyzed  also  by  physical  consideration.  The  remarks  made 
In  this  work  are  by  no  means  complete  but  should  be  viewed  as 
suggest  I on*  for  futurs  researchers  in  this  area.  The  physical 
explanation  of  natural  phenomena,  of  which  human  vision  is  one 
instance,  is  of  importance  not  only  for  understanding  of  the 
mechanism  as  such  but  to  assist  the  design  of  artificial  mechanisms 
that  Interact  with  nature. 


APPENDIX  A 


DERIVATION  OF  THE  EQUATIONS  FOR  A AND  B FILTERS 


In  this  appendix  we  shall  derive  and  detail  the  restrictions 
on  the  systems  A and  B.  The  derivation  refers  to  the  block  diagram 
of  fig  (1.5).  ♦ 

Us  shall  denote  by  a(x,y),  b(x,y),  and  v(x,y)  the  impulse 

response  of  the  systems  A,  B,  and  V respectively  and  by  n(x,y) 
the  channel  (random)  noise.  From  fig  (1.5)  we  have 
(Al)  D2*  - 0(x,y);  Dj*  (x,y)  - Ca©D+n] ®b 

(A2)  Dz’-Dj'  - [a©b-S  (x,  y)  ] ©D+b®n 

where  S(x,y)  is  a two  dimensional  impulse  function.  Also  we  have 
(A3)  I"(x,y)  - D*  (x,  y)  ®v  (x,  y) 

Our  aim  is  to  find  the  linear  systems  A and  B such  that  the  quantity 
M defined  in  (A4)  is  minimized. 

(A4)  M . E ( [Ij"-Ii")  *)  - E ( C (Dj’  -Dj*  )®v]  *) 

where  E (•)  is  the  expected  value  operator  and  the  random  processes 


* Results  similar  to  those  obtained  here  may  be  partially  found  in 
Costas  (1). 


ft  Throughout  this  appendix,  to  alleviate  the  tedious  typing 
procedure,  we  shall  frequently  delete  the  arguments  of  the 
functions.  In  cases  where  ambiguity  may  arise  we  adopt  the  notation 
of  using  lower  case  letters  for  space  domain  functions  and  upper 
case  letters  for  the  corresponding  frequency  domain  function.  The 
two  dimensional  frequency  (f»,fy)  will  be  denoted  by  (f). 
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are  D(x,y)  and  n(x,y)  which  are  assumed  to  be  statistically 
Independent.  Therefore,  using  (A2) 

(A5)  (I  - E I (a©b-S)©v©D  + n©b©v)  z) 

-E  l [la®b-S)ev®01 z)  + E l tn©b©v]  z) 

Using  Pareeval’e  theorem  and  switching  to  the  frequency  domain  we 
get 

(AG)  n - ;iA(f)B(f)-llzIVIzSddf  + ;iB|z|V|zSndf  ♦ 

where  Sd(f)  and  Sn(f)  are  the  power  spectra  of  the  signal  D(x,y)  and 
n(x,y)  respectively.  A6  explained  in  chapter  one  we  also  want  the 
energy  of  the  signal  being  transmitted  to  be  constrained  thus 
requir ingj 

(A7)  S |A|zSddf  - K 

or  by  using  noise  units  we  define 

K-p-JSndf 

Where  plea  positive  integer. 

Let  ue  define  the  quantity 

(A8)  H ( f ) - [|AB-llzSd+IBIzSn)  IVIz+XIAI^d 

in  which  X is  a constant  called  Lagrange  multiplier.  Using  the 
calculus  of  variations  for  minimising  (1  in  (AG)  with  the  constraint 
specified  by  (A7)  yields  a regular  m in-max  problem  on  the  function  H 
in  terms  of  A and  B 13).  Since  A and  B are  in  general  complex 
functions  ue  have  to  solve  the  following  set  of  four  equations: 


♦ All  the  integrals  in  this  appendix  have  the  limits  -«•  to 
unless  otherwise  stated. 


(A9) 
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dH/dRe  (A) -0{  dH/dIm(AI-0» 
dH/dRe(B)-0j  dH/dl m (B) -0; 

Since  each  of  the  quantities  In  the  left  hand  sides  of  equations 
(A9)  is  real  (note  that  H itself  is  real)  we  can  equivalently  solve 
the  following  set  of  two  complex  equations 
(A10)  dH/c>Re  (A)  + jdH/dlm  (A1 -0 

dH/dRe  (B)  + jdH/dImlBI-0 

Performing  the  derivation  of  equation  (A10)  we  get« 

(All)  SdIV  | 2 ( AB— 1 ) B'+i\SdA-0 

(A12)  SdlV|2(AB-l)  A"+|V  |2SnB-0 

Ue  aha  1 1 Introduce  the  following  notation  in  the  rest  of  thle 
appendix! 

//-l/U)1'2;  S- (Sn/Sd)  1/2;  cc-l/  C/d|VI) 
equations  (All)  and  (A12)  can  now  be  rewritten  as 


(A13) 

(AB-l)B-  - -oczA 

(A14) 

(AB-1) A'  - -SZB 

Devldlng  (A13) 

by  (A14)  we  get 

B'/A"  - (az/Sz)  (A/B) 

or 

(A15) 

IB  lz/  |A 1 2 - az/Sz 

and  multiplying  (A13)  by  the  complex  conjugate  of  (A14)  yield 

(A3-1)  B*-  (AB-1)  "A  - (-cczA)  (-S*B') 

(A1G)  IAB-11*  - ctzSz 

Eliminating  A ( f ) from  ( A 1 3 ) and  B(f)  from  (A14)  yield 
(A17)  A - B-/(IBI*kc») 


(A18) 


B - A‘/(|A|*+S*) 
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Equation  (A18)  is  quite  important.  It  states  that  B(f)  is  the 
Wienner  filter  for  a given  A(f). 

Multiplying  (A17)  by  the  conjugate  of  (A18)  gives 

AB‘  - [BV(IBI*+a*)3  [A/(|A|2+S2)] 

(A19)  (IB|2+«2)  (|A|2+S2)  - 1 

and  substituting  (A5)  in  (A19)  yields  the  final  expression: 

(A20a)  IA |2  - S/ot-Sz  - S/a(l-aS) 

The  left  handslde  of  the  latter  equation  is  obvioeuely  (real  and) 
positive  whereas  the  right  handside  is  not  necessarily  so.  Equation 
(A20a)  should  be  interpreted  (see  C13 , C33 ) as 

S/or(l-aS)  aS  < 1 

(A20b)  |A  I2  - 

0 Otherwise 

Equation  (A15)  states  that  whenever  |A|  is  zerc  so  is  |B|.  At  those 
frequencies  the  phases  of  A and  B are  unimportant.  He  shall  now 
f ocu8  our  attention  to  those  frequencies  for  which  IA|*0  in  trying 
to  find  their  phases. 

Multiplying  (A17)  by  (A18)  yields 

AB(  IB  iz+az)  (|A|2+S2)  - A-B* 
and  using  the  result  of  (A19)  we  get 

AB  - (AB)’ 

which  means  that  the  quart i ty  A-B  is  real  thus 
(A21)  - -IB 

This  is  the  only  constraint  imposed  on  the  phases  of  A ( f ) and  B(f) 
and  therefore  we  may  set  the  phases  of  both  systems  to  zero  at  all 
frequencies.  Note  that  since  Sa ( f ) and  Sn(f)  are  even  functions  the 
choice  of  even  V(f)  makes  A ( f ) even  and  thus  a(x,y)  will  also  be 


I 

I 

1 


I 


i 


real  and  even. 

Ue  now  have  to  evaluate  /j  (or  equivalently  X) . He  have  only 
to  satisfy  equation  (A7)  hence 
(A22)  K - J |AI2Sddf  - JS/a(l-ccS)Sddf 

- >u;|V|(SoSd)1/2df  - JSndf 

This  integral  is  evaluated  at  the  region  for  which  //>  (s,y^d) 1/2/ IV I . 
This  yields: 

(A23)  m - (p+1)  JSndf/J|V|(SnSd)l/2df 

There  exists  a smallest  m such  that  (S"/srf) IV  | for  all 
frequencies.  This  is  called  The  critical  m (denoted  by  nc) . For 

/j>pe  all  integrals  are  evaluated  from  -»  to  +®  and  all  the  results 
may  be  obtaind  in  close  form.  For  h>ijc  we  have 

IAI*  - (S^d)‘'!(//V-(Sys<j)i/«] 

B - AV(//V(s"/sd>1/zJ 
n = [J|V|(SnSd)l/2df] //i 

The  only  items  that  remain  to  be  taken  care  of  are  the  special 
cases:  Sn-0,  Sd-0  and  V-0.  By  looking  at  equations  (All)  and  (A12) 
we  note  that  if  for  some  frequency  V ( f ) —0  equation  (A12)  holds  and 
equation  (All)  yields  A(f)«0,  and  therefore  B(f)  may  be  set  to  zero 
for  that  frequency.  This  is  only  a hypothetical  case  since  the 
human  visual  system  is  net  "blind"  to  any  frequency. 


If  Sd(f)-0  for  some  frequency  equation  (All)  holds  and 
equation  (A12)  yields  B ( f ) — 0 and  A ( f ) is  arbitrary  at  that  frequency 
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and  mag  thus  be  set  to  zero  to  save  power.  This  case  means  that  in 
the  entire  ensemble  there  is  not  one  single  member  that  has  a 
nonzero  component  in  this  frequency.  In  the  ensemble  of  regular 
images,  with  which  we  are  dealing,  this  does  not  happen. 

If  Sn-0  for  some  frequency  we  have  from  equation  (A14) 

B (F ) —A"1  ( f ) which  when  suostituted  into  (A13)  gives  A ( f ) — 0 and  B(f) 
is  undefined  (also  A has  no  inverse).  This  situation  has  to  be 

interpreted  in  the  following  way.  If  Sn(f)=0  it  means  that  there 

will  never  be  a noise  component  in  this  frequency  and  the 

Information  conveyed  by  this  frequency  can  be  accurately  restored. 
But  since  we  want  to  use  the  least  power  possible  for  sending  this 
information  we  may  set  A ( f ) - c where  c is  arbitrarily  small  and  then 
make  B(f)-l/c  in  thi6  frequency,  so  that  the  information  ie  truely 
restored  and  a very  small  amount  of  power  used.  Practically  we  do 
not  expect  nature  to  be  that  courtous  so  this  situation,  although 


possible,  is  very  rare. 
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